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Abstract 

Different aspects of protein folding are illustrated by simplified 
polymer models. Stressing the diversity of side chains (residues) leads 
one to view folding as the freezing transition of a heteropolymer. Tech- 
nically, the most common approach to diversity is randomness, which 
is usually implemented in two body interactions (charges, polar char- 
acter,..). On the other hand, the (almost) universal character of the 
protein backbone suggests that folding may also be viewed as the 
crystallization transition of an homopolymeric chain, the main ingre- 
dients of which are the peptide bond and chirality (proline and glycine 
notwithstanding) . The model of a chiral dipolar chain leads to a uni- 
fied picture of secondary structures, and to a possible connection of 
protein structures with ferroelectric domain theory. 
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1 Introduction 



Proteins are polymers, which have the property of folding reversibly into a 
single geometrical shape with biological activity. The folding process can be 
modeled as a phase transition from a high temperature coil phase to a low 
temperature compact phase (other parameters, e.g. pH, may also trigger the 
transition) . Both phases require a very detailed description of the monomers, 
not to mention the chemistry of water. These notes aim at presenting some 
of the theoretical approaches to the folding transition. The interested reader 
can find more details in the short list of references at the end of the paper. 

Section |21 gives a brief description of the twenty monomers (amino acids) 
which are the building blocks of proteins. A protein of N amino acids can be 
characterized by its primary structure, i.e. by specifying which amino acid is 
actually at position (i) along the chain (with i = 1, 2...N and N of the order 
of a few hundreds). 

At first sight, a protein may be considered as a heteropolymer, with amino 
acid (i) being characterized by its electric charge (%), its hydrophilicity (Aj).... 
A closer look at a protein chain reveals that this heteropolymer is made of an 
almost periodic backbone and of different side chains (residues). This almost 
periodic backbone is almost protein independent, leading to a homopolymeric 
approach to the folding transition. This homopolymeric view is supported 
by the ubiquitous existence of helices and sheets in proteins (which are called 
the secondary structures). Folded proteins illustrate the coexistence of spe- 
cific features (primary sequence, type of biological activity,...) and universal 
features [TJ |2| such as helices and sheets (one may also speculate about the 
universal character of other issues, such as the very existence of a biological 
activity or the aggregated structures in amyloid-like diseases |3]). 

Some of these properties will be illustrated, mostly in a pictorial way, on 
a small protein (laps, N = 98) in Section EJ This example shows that one is 
not concerned by the theorists' thermodynamic limit: proteins have a finite 
size, pointing towards an important role of the surface, and therefore of the 
solvent (i.e. water for globular proteins). More numbers pertaining to the 
folding process will be given in Section 

I think it is fair to say that the homopolymeric aspects of proteins have 
been so far less studied. Since there are numerous reviews on their het- 
eropolymeric properties, I will mostly deal here with homopolymeric models, 
and conclude on the interplay between both types of properties. 
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2 Chemistry [J, Q| 



• A brief description of the monomers 

— There are twenty different types of monomers 

ASP, GLU, LYS, ARG, ALA, VAL, PHE, PRO, MET, ILE, LEU, 
SER, THR, TYR, HIS, CYS, ASN, GLN, TRP, GLY 

— These monomers are amino acids (exception: PRO) 

H 2 N - Cq,HR - COOH 

e.g. 

ALA: R = CH 3 

GLU: R = (CH 2 ) 2 - COO- 

— The amino acids are chiral (exception: GLY) 

H 2 N - Cq.HR - COOH 

Sitting on the C a — H bond and looking towards the C a atom, 
one sees the CO — R — N sequence in a clockwise way. 

— Residues R have different properties 

H 2 N - C a HR - COOH 

* Charged residues ASP (-), GLU (-), LYS (+), ARG (+) 

* Polar residues TYR, HIS, ASN, GLN 

* Rather polar residues PRO, THR, ALA, GLY, SER 

* Hydrophobic residues VAL, PHE, MET, ILE, LEU, TRP, 
CYS 

• From monomer to polymer 

— Formation of the peptide bond 

Geometrical constraints (stemming from quantum chemistry), im- 
ply that consecutive C a — CO — NH — C a atoms are coplanar, 
with the C-0 (and N-H) bonds being roughly perpendicular to 
the virtual C a — C Q bond. 

At long distance, the charge distribution on the peptide bond is 
dominated, in a first approximation, by an electric dipole parallel 
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Figure 1: Peptide bond: the C a — C a virtual bond is roughly perpendicular 
to the CO and NH bonds. 

to OC and of order four Debyes. At shorter distances, this charge 
distribution gives rise to hydrogen bonds. 

With the exception of residues GLY and PRO, the backbone chain 
can be written 

...NH — C a — CO — NH — C Q — CO — NH — C a — CO — NH — C Q — CO... 
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Figure 2: Defining the dihedral angle 1234 for atoms 1,2,3,4 

— Chirality of the main chain 

The chirality of the amino acids, together with the steric con- 
straints gives to the main backbone chain some sort of chirality. 
More precisely, taking four atoms 1234 along the main backbone 
chain, the associated dihedral angle 1234 is defined as the angle of 
the (123) and (234) planes. It is shown in Figure 2, with the 2-3 
bond coming out of the paper. The convention is such that the 
above dihedral angle 1234 is negative. Note that 1234 = 4321. 

The chirality of the chain is associated with the fact that positive 
and negative dihedral angles do not have the same steric con- 
straints and therefore not the same (free) energies. In particular, 
Ramachandran's plots for a given residue, are given for = 1234 
(resp. V) where 1234 = CNC a C (resp. 1234 = NC Q CN). Except 
for GLY, these plots show that helices and sheets correspond to 
rather well defined (and non symmetric) regions in 0, ip space. 

One may also define Ramachandran's plots for side chains, but we 
will neglect their role in this paper. 

So with the (important) exceptions of residues GLY and PRO, the main 
backbone chain can be represented in a homopolymeric way (see e.g. 
[3]). One may then consider the chiral (hydrogen-bonding+dipolar) 
chain as a good description of secondary structures in proteins. 
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3 Example 

I will illustrate some of the previous points on a specific protein (laps). 

• Primary structure ( A = 98 residues) 

(i=l) SER-THR-ALA-ARG-PRO-LEU-LYS-SER-VAL-ASP-TYR-GLU- 
VAL- 

-PHE-GLY-ARG-VAL-GLN-GLY-VAL-CYS-PHE-ARG-MET-TYR-ALA- 

-GLU-ASP-GLU-ALA-ARG-LYS-ILE-GLY-VAL-VAL-GLY-TRP-VAL- 

-LYS-ASN-THR-SER-LYS-GLY-THR-VAL-THR-GLY-GLN-VAL-GLN- 

-GLY-PRO-GLU-GLU-LYS-VAL-ASN-SER-MET-LYS-SER-TRP-LEU- 

-SER-LYS-VAL-GLY-SER-PRO-SER-SER-ARG-ILE-ASP-ARG-THR- 

-ASN-PHE-SER-ASN-GLU-LYS-THR-ILE-SER-LYS-LEU-GLU-TYR- 

-SER-ASN-PHE-SER-VAL-ARG-TYR (i=98) 
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Figure 3: Native spacefilled structure of laps 



Figure 4: Backbone of laps: Helices 1 (PHE 22-ILE 33) and 2 (GLU 55-LEU 
65). Sheet ( strand 1 (LYS 7-VAL 13); strand 2 (VAL36-THR 42); strand 
3 (THR 46-GLY 53); strand 4 (ARG 77-THR 85); strand 5 (ASN 93-ARG 
97)) 
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Figure 5: Charged residues of laps 
Compactness of the folded structure (Figure 3) 

The total number of atoms is of order one thousand, linking protein 
folding with cluster physics (with a chain constraint) jHllEj- 

Charged residues (Figure 5) 

There are 26 charged residues (three ASP, seven GLU, seven ARG and 
nine LYS). For electrostatic reasons, they are located on the protein 
surface. More generally, since there are four (out of twenty) charged 
residues in usual conditions, and assuming equipartition, a protein of 
N residues has N/5 charged residues to be placed on the surface (which 
scales likes N's), leading to an estimate N ~ 125 for a typical single 
domain protein. 

The main backbone chain (Figures 4, 6, 7) 

As mentioned above, the main chain is homopolymeric with the ex- 
ception of three PRO and eight GLY residues. This homopolymeric 
character is illustrated by the CO bonds. The (roughly) ferroelectric 
order of helices and the (roughly) antiferroelectric character of sheets 
result from hydrogen bonding and dipolar-like forces. 
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I could not find the active site of laps on the web. From a physical point 
of view, one would like to understand how the primary sequence somehow 
encodes the native structure (to further extract the active site from the native 
structure is not easy). 

4 Numbers [J, Qj 

• Energy scales 

At room temperature T , the equivalence between chemical and physi- 
cal units is k B T ~ 0.6 kcal/mole ~ eV/part. A hydrogen bond has 
an energy of order 2-6 kcal/mole. A covalent bond has an energy of 
order 50 — 200 kcal/mole. In the folded state, two consecutive peptide 

2 

bonds have an energy of order 1-3 kcal/mole ( obtained from 47r ^° d 3 , 
with p ~ 4 Debyes, d ~ 4 A, and where e r is believed to be of order 
2 — 5). Finally, van der Waals attraction energies are of order 0.3-1 
kcal/mole. 

The folding transition is first order, with an entropy loss of order ks 
or less per residue (possibly raising questions on the applicability of 
Classical Statistical Mechanics). 

The dynamics of the folding is governed by energy barriers: the range 
of folding times is of order 10 -3 — Is, and even longer; it should be 
compared with microscopic times of order 10~ 13 — 10~ 15 s. This suggests 
that the phase space of a protein may have many trapping local minima, 
implying problems in numerical simulations. 

• Geometry and Energy 

A typical protein has something like N ~ 100 — 500 residues (the num- 
ber of atoms being of order a few thousands). Something like half of 
the residues belong to the surface. A typical linear size of the folded 
molecule is R ~ 50 A. As can be seen from the examples of laps and 
other proteins, the typical length of a helix is 10-20 residues, that of a 
strand being smaller (may be 5-8 residues). In broad terms, proteins 
are clusters-with-a chain-constraint (and a solvent). Geometrical con- 
straints (bond lengths, valence angles, van der Waals radii, chirality,..) 
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play an important role in proteins. The all-atom CHARMM energy E 
commonly used in numerical situations is given by [7j 



E= J2 h(b- b ) 2 + J2 ke(6- e ) 2 

bonds angles 

+ k^l + cos(n<f) - 5)) + k u (u-v 

dihedrals impropers 

x - 332<7i<7j 



,2 




i<j £ r r ij 



(1) 



where distances are in Angstroms, angles in radians and E in kcal/mol. 
Studying the dielectric permittivity e r is a difficult task; it is believed 
to be of order 2 — 5 in a folded protein. 

(Bio)chemistry 

Before considering simplified models, let me recall a few facts about 
real proteins. One should be cautious about in vivo vs in vitro folding 
(role of chaperone molecules). Real proteins are not random polymers, 
but have been evolution selected. Since (quantum) hydrogen bonds are 
important in proteins, the use of Classical Statistical Mechanics may 
be questioned. To this discouraging list, one may add that water (for 
globular proteins) is a complicated, strongly structured solvent. We 
will not even mention the biochemical activity (e.g. the recognition of 
the active site by a ligand). The distance between real proteins and 
simplified models is not to be underestimated. 



5 Simplified models: homopolymeric approach 

5.1 Modeling hydrogen bonds in a compact phase 

The discovery of helices and sheets by Pauling and Corey jHJ E] relies on the 
existence of short range hydrogen bonds. Let us consider first an all-helix 
folded protein. What we have here is a competition between local and global 
orders: the former favors helical (i.e. one dimensional) structures (figure 8), 
whereas the latter favors compact (i.e. three dimensional) structures. 
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Figure 8: Hydrogen bonds in a helix 



This competition can be modeled as follows. We consider a polymer chain 
on a cubic lattice, where the monomers interact with an attractive van der 
Waals energy e v (to ensure compactness at low temperature), and a curvature 
energy Eh, which favors the alignment of two consecutive monomers. In this 
model, a monomer represents a helical turn of the protein (two consecutive 
helical turns allowing for the presence of hydrogen bonds). 

The phase diagram depends on the ratio (— ); for simplicity, we will 
restrict the discussion to fully compact structures (e v = oo). The case 6h = 
is of interest in the physics of hydrophobic chains at temperature below the 9 
point, which are known to possess a large entropy in the compact phase. We 
will first study this model, and then consider the influence of the curvature 
term Eh- Technical details are postponed to an Appendix. 

5.1.1 Entropy of a hydrophobic chain [10J 

We first define a Hamiltonian Path (HP) as a fully compact self avoiding walk 
(SAW). The number of Hamiltonian Paths (HP) on the lattice is formally 
described by 

m n = E 1 

(HP) 

Introducing an n-component field (pp at each point r of the lattice, and 
using the properties of the limit n — > 0, it is shown in the Appendix that 

Af N = lim^i / V0 r e-sEo-y,* (A_1) -' -Q (l^ (2) 
where = 1 if r and ? are nearest neighbour sites, and otherwise. 
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A homogeneous and isotropic saddle point evaluation (pp= <p yields 

Afr=(*)" (3) 

with q — 2d = 6. 

This calculation suggests that a collapsed homopolymeric chain is a mix- 
ture of an exponentially large number of conformations, and is therefore not 
able to describe the single conformation of a native protein. For this reason, 
the folding transition is commonly thought to be quite different from the 9 
transition. Note however that equation © depends on the use of a homoge- 
neous saddle point in equation (J2J), which is consistent with the use of periodic 
boundary conditions. More general boundary conditions (corresponding to 
non-homogeneous </v) would give a expression of the form 

Af N ~ An N Ji NV3 (4) 

where A is a constant, fi and Jx being respectively bulk and surface connec- 
tivities. 



5.1.2 "Helices" in a compact phase 

We now implement the competition between one- and three-dimensional 
structures, by introducing a curvature energy Eh- As mentioned above, this 
curvature energy favors aligned consecutive "monomers" (or disfavors cor- 
ners in the HP). Denoting by N corners (H P) the number of corners in a given 
(HP), the number of weighted (HP) is given by 



J\f h _ e -i3N corners (HP)e h 
(H P,corners) 

where the summation runs over all possible (HP)'s and over all possible 
corners, and where (3 = is the inverse temperature. 

Introducing d n-dimensional fields: (p ar , a = 1, 2, ...d for each lattice site 
(r), it is shown in the Appendix that 



N h = lim^o- / V0 ar e-JELxEovo*- {A " W II f \ E + ^ E 0- 

(5) 



ctr 

a=l a<7 
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where A2p , is 1 if r and r ' are nearest neighbours in direction a and 
otherwise. 

Performing a homogeneous and isotropic saddle point in equation (jSJ) we 

get 

with an effective coordination number, q(/3) = 2 + 2{d — l)e~^ h . 

A first order crystallization transition, describing the competition be- 
tween the entropy gain of making turns, and the corresponding energy loss, 
occurs for g(/3 c ) = e. For d = 3, the transition temperature is k-sT c = 0.58 e^. 
Below the transition, the entropy is not extensive. The average length of a 
helix is given by 

U(Pc) 

where U(/3) = — JrJog A4 is the internal energy. Just above the transition, 

in d = 3, the average helix length is equal to £ c = 3.78, and is of 0{£ = iV 1//3 ) 
in the low temperature phase. Note that in this very simplified picture l c 
corresponds to a typical number of residues of the order of 15, since one 
monomer corresponds to a helical turn, that is 3.6 residues. As seen from the 
example of laps, and from numerical calculations using Hamiltonians such 
as (equation((H)), this is indeed the typical length of a-helices in proteins. 

These results result from a homogeneous saddle point assumption (imply- 
ing the use of periodic boundary conditions). In a more correct treatment, 

2 

we expect a non extensive surface entropy of order iVs in the crystalline 
phase (corresponding to the fact that the corners are on the surface of the 
lattice). The influence of boundary conditions on the counting of (HP), with 
or without curvature energy, is a rather difficult subject jTT]. 

Finally, relaxing the constraint (e v = oo), yields a phase diagram where 
one may reach the crystallized phase either through a 9 transition followed by 
a second (liquid globule-crystal) transition, or through a unique discontinuous 
coil-crystal transition One may thus consider that there are two coil 

"phases" , one above the 9 transition, and the other above the crystallization 
transition, differing by their short range order. 
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Figure 9: Hydrogen bonds in a sheet 



5.1.3 Fully compact "sheets" 

The extension to sheets (see figure 9) can be done with a slight generalisation 
of the Hamiltonian Path formalism: a node of a path is now to be interpreted 
as an amino acid. 

The model can be described in the following way. Consider a Hamiltonian 
path. To mimic the formation of CO — HN, i.e. a hydrogen bond (H-bond), 
in /3-sheets, we allow an H-bond (energy gain e s ), whenever two pairs of 
aligned links belong to two (non intersecting) neighbouring strands. We do 
not make any distinction between parallel and antiparallel sheets. Following 
the representation of the Appendix, and performing an isotropic homoge- 
neous saddle point, one also gets a first order crystallization transition. The 
physics is very similar to the cas of helices. Typical lengths of ordered strands 
are however more difficult to estimate [T3j . 

5.1.4 Conclusion on Hamiltonian Paths 

We have presented a simple model of the formation of secondary structures 
in a dense phase, which is linked to polymer melting theory. Starting from 
the coil state, one can reach the "compact state with secondary structures" 
either directly or through other compact phase (s). These results rest on 
a homogeneous saddle point approximation, which corresponds to periodic 
boundary conditions. The "n — ► 0" approach ^3JE3 represents the chain via 
a field <p r and is therefore not appropriate to the description of heteropoly- 
meric properties ( where the information depends of the curvilinear abcissa 
(i) along the chain). In our presentation, helices and sheets were treated on 
a different footing, since one monomer was a helical turn (3.6 amino acids) 
in the former case or a single amino acid in the latter. This dissymetry will 
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be corrected below where we also consider the long range contribution of the 
dipole-dipole interaction. 

5.2 The dipolar chain 

Since the peptide bond has a large dipole moment, it seems rather natural 
to investigate the properties of the dipolar chain, which connects successive 
C a carbon atoms. Representing the peptide bond by a dipole moment is 
an approximation, and the dipolar interaction should be modified at short 
distances. The Hamiltonian of the model then reads 

^ = yE m - r 3 ) + § £ ErfG^fa rj) P ] (7) 

In equation (JJJ), (3 = ^ is the inverse temperature, vq is the excluded 
volume, r*j denotes the spatial position of monomer i (i = 1,2, ...N), and p» 
its dipole moment. If necessary, three body repulsive interactions may be 
introduced, to avoid collapse at infinite density. The (infinite range) dipolar 
tensor reads ^ 

G aj (f, 7) = A (Scry - 3w Q w 7 ) (8) 

with v a = ^z^f an d ^4 is a prefactor containing the dielectric constant 
of the medium. The dipolar interaction (jHJ) is modified at small distances 
(|r — 7*1 < a) and may also be cut-off at large distances by an exponential 
prefactor. The partition function of the model (j2J) is given by: 

Z = /n>W, 6(\n» - nl - a) m -„) exp (-/JW) (9) 

i 

In equation a denotes the Kuhn length of the monomers, and p$ is 
the magnitude of their dipole moment. The third 5-function constraints 
the dipole moment Pi to be perpendicular to the chain (Figure 1), with full 
rotation around bond (i, i + 1). 

Apart from numerical simulations, there are two main approaches to the 
dipolar chain: 

(i) one may try to integrate out the dipoles, and get an effective Hamilto- 
nian for the chain. Since dipoles favor anisotropic configurations, one expects 
that the above orthogonality constraint will lead to an anisotropic collapsed 
phase. 
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(ii) one may tackle the full problem, and using what is known about 
ferroelectric domains, one may guess low energy structures. 

The latter approach is not without risks, since the determination of do- 
main structures from first principles is an unsolved problem in non-soft con- 
densed matter physics. The former, which is a little more tractable, is an 
extension of the hydrogen bond models (see section lo~Tj) . with one important 
difference: the long range character of the dipole interaction, implies that 
the surface of the collapsed globule is itself a variational parameter. It also 
allows for a first step in accounting for chirality. 



5.3 Integrating the dipoles 

• Order parameters 

For simplicity, we soften the constraint of fixed length dipoles in equa- 
tion (JHJ) and replace it by a Gaussian constraint. We therefore have 

f _ 1 -4r 

Z G = J II d^idpi 5(\r i+ x - fi\ - a) e 2p o 5{pi-{r i+1 -ri)) exp (- 

(10) 

where the subscript G on the partition function stands for Gaussian 
and the Hamiltonian 7i is given by equation 0. Using the identity 



A 



3/2 

5{y) = lim I (11) 

A^oo \ Z7T / 



we may now perform the (Gaussian) integrals over the dipole moments 
Pi in equation (jl(J|) . As a result, the problem now depends only on 
the polymeric degrees of freedom. Introducing the tensorial parameter 
Qap(r) by 

Qap{r) = ((Ui)a(Ui)l3 - S a/3 ) 6(f- fj) (12) 
i 

where (a,j3 = x,y,z) and the notation Ui = (r i+1 — fi)/a was used, 
we can express the effective Hamiltonian as a function of the physical 
order parameters 

p{r) = ~ Tr Q(r) =£<S(f-r-) (13) 
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and 

<m(*0 = E««f - ¥) *(^- 3) (i4) 

i 6 

In a way analogous to the previous section, the density p(r) is appro- 
priate for an isotropic 9 transition, and the dielectric tensor q(r), is 
appropriate for a (liquid) crystalline order. Using non rigorous approx- 
imations (which are actually valid in a melt), we find that the dipolar 
chain undergoes a second order 9 transition from the coil phase to a 
(liquid) collapsed phase, with order parameter p(r), followed, at lower 
temperature, by a first order transition with order parameter q(f). The 
ordered phase is a (liquid crystal) collapsed phase. 

• Trying to include chirality 

If one follows liquid crystalline traditions ^B|, chirality is usually rep- 
resented as the simplest non trivial term in a Landau-like expansion 
of the free energy in the order parameter q(r). It is well known that 
chirality is not easy to take into account at a microscopic level [T7] , but 
at a Landau free energy level, symmetry considerations lead, to lowest 
order, to a chiral contribution 

Fchiraiir) = J D e aflu q aS d^q u5 d 3 r (15) 

where e afll/ the completely antisymmetric tensor and D is a measure of 
the strength of the chirality (in fact several parameters are in general 
needed) . 

Using the same non rigorous approximations as above, we find that the 
9 transition is very weakly D-dependent, whereas the transition towards 
the liquid crystalline phase increases strongly with D [18 . This liquid 
crystalline phase is now modulated, and has strong similarities with the 
liquid crystalline blue phase(s). Interestingly enough, for strong enough 
chirality, we get a direct transition from a coil phase to a compact phase 
with a modulated order parameter q(r). More specifically, one has 

q a ,(k) = J2^K^- 6 -f) (is) 

in Fourier space (with k ^ 0). The indices a, (3 are space, not replica, 
indices. The order parameter is very similar for the case of (idealized) 
helices and sheets where 
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(i) a (ideal) helix is described by r*j = (u cos v i: u sin v iy vi) and constant 
u. 



(ii) a (ideal) sheet is described by p = 1,2, ...M strands where each 
strand is described by a segment r p = (u cos t> p , w sin v p , v p ) and con- 
stant v p . 

To summarize, we have found that within certain models and approxi- 
mations, we may get for the dipolar chiral chain, a direct and discon- 
tinous transition from a coil phase to a compact phase with secondary 
structures. The order parameter (|16|) describes both helix-like and 
sheet-like conformations. 

This model may seem oversimplified. For instance, the chiral term 
(Jinj) can be rewritten as (v/(r^) ■ (Ui x Uj)) (ui - Uj), where f(fij) 
describes a short range (in space) interaction (the coordinates r*j are 
the coordinates of the virtual C a chain). This term does not compare 
well to the dihedral terms of the all atom CHARMM energy which read 

M 1 + cos (™0 - £)) 

dihedrals 

where 0(1234) = cos _1 (fi 12 3 • n 234 ) where is the unit normal vec- 
tor to the plane (a,b,c). Our modeling of chirality, through a single 
parameter and a first order gradient term, is rather primitive. In par- 
ticular, higher order gradient terms, describing shorter distances, are 
certainly important (see the example of the blue fog in liquid crystalline 
blue phases) [19j. Furthermore, the precise spatial organization of the 
compact ordered phase also depends on its surface. 

With all these caveats, it should be mentionned that computing and di- 
agonalizing (q a p(r)) for real proteins, leads to a reasonable characterization 
of secondary structures. Helices are essentially uniaxial (two different eigen- 
values), whereas sheets are biaxial (three different eigenvalues) |2Ti] . 
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5.4 Non integrating the dipoles 

The non integration of dipoles leads one to consider low energy dipolar struc- 
tures. I will first recall a few facts on ferroelectric domains (structure, order 
parameter,...). More speculative issues, such as a "biological" interpretation 
of defects of this order parameter, or the introduction of surfaces in protein 
folding, will be briefly examined. 

• Dipolar ordering 

There is another way to consider the dipolar Hamiltonian, namely to 
use the identity (see equation (jHJ)) 



7^1 3 dradr'^ 




(17) 



Defining a local polarization P(r) = J2iPi${r — f£), one may transform 
the dipolar term of equation ((7j) into a Coulomb Hamiltonian, with 
a (continuous) distribution of bulk (p(r) = —div P(r)) and surface 
(cr(r s ) = P(r s ) • N(r s )) charges, where r s belongs to the surface, and 
N(r* s ) is the normal to the surface at this point. Given the dimensions 
and discreteness of the system, this continuum picture may not be very 
satisfactory [21], but we will nevertheless use it. 

A (low temperature) collapsed dipolar chain, if long enough, will break 
into (Bloch, Weiss, Neel...) domains, as I now show on a simple ex- 
ample. Let us consider an Ising chain, with short range exchange (Jo) 
interactions between neighbouring monomers (see the Appendix and 
reference [22] )• At low temperature, the chain is collapsed and has a 
uniform polarization. If one adds a long range dipole-dipole interac- 
tions (Jdd), one may test the stability of the uniform state: flipping half 
of the dipoles results in an energy cost of order AE = + JqR 2 — JddR 3 , 
where we have considered a spherical globule of radius R, and dropped 
some numerical constants. For R > R* = j 2 - (or iV > N* = (j^) 3 )> 
the system will break into domains. 

For many dipolar systems, low energy domain structures in ferroelec- 
tricity (and ferromagnetism) tend to have p(r) = —div P(f) =0 (pole 
avoidance "principle" [22] )• This "principle" is obeyed in the case of 
helices (where P(r) is a constant vector) and sheets ( where P(r) has 
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Figure 10: Circulating CO dipoles in protein la4y 

roughly antiferroelectric order). One may understand in a similar way 
solenoidal proteins (where P(r) looks like a curl, i.e. has a circulating 
pattern) such as /3-barrels or protein la4y (Figure 10). 

The fact that dipolar interactions couple the f^'s and the p,'s implies 
that dipolar systems are very sensitive to the geometry. For instance, 
the infinite simple cubic and face centered cubic lattices do not have the 
same ground state order. For finite systems, the situation is even more 
tricky, since dipolar interactions "feel" the surface of the system. The 
formation of domains results in general from the competition between 
dipolar- and other (shorter range) - interactions. The full determination 
of domain structures (size, order parameter, spatial organization,...) in 
non-soft condensed matter physics depends on non-extensive terms in 
the free energy, and remains an unsolved problem [22] . 

At a smaller (cluster) scale, Singer and coworkers [21] have studied 
the ground state of some dipolar clusters without the chain constraint. 
They pointed out that possible order parameters are the vectorial spher- 
ical harmonics (VSH) j2Hl, and that circulating patterns have then sim- 
ple expressions. Writing 
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P(-)= E£^x(r)?£i(M) 

(JM) L 

one may follow what has been done |26j for (scalar) bond orientational 
order in clusters. In this case, one expands the density as 

p(*0 = E QLM(r)Y LM {0,<t>) 

(LM) 

where the Ylm{9,4>) are the ordinary spherical harmonics. The (VSH) 
formalism is rather heavy, and I have done only preliminary calculations 
on real proteins. As expected, helices are easier to analyze than sheets. 

• Speculations on order parameters and defects 

One may also wonder whether the existence of an order parameter may 
help us to understand the existence of an active site, see e.g. [2*7| l2~5]. 
The simplest idea I can think of is to view the active site as a defect 
of this order parameter (see e.g. (23 I-30J ) . By defect, I mean here a 
topological defect. 

In a first approach, we have integrated out the dipoles and found an 
order parameter q a p{f), endowed with complicated line (and other) 
defects j2HHH0|). I will not consider these defects here. 

If we do not integrate the dipoles, the order parameter is the local po- 
larization or the local electric field. But real proteins have also charged 
residues: since dipoles tend to order along their local electric field e, 
there seems to be a competition between the div e = order of the 
main chain and a div e ^ order of the charged residues. As a result, 
an active site would tentatively result. One possibility is to consider 
the topological defects of a vector field in three dimensions (namely 
points and non singular textures). An interesting example is given in 
ref. [31] where it was found that half of the electric flux through the 
active site of some /3-barrels came from the main chain. On the other 
hand, non singular textures can be understood with the following ex- 
ample: a (div e = 0)- ordering implies that one may write e = curl c. 
One may then calculate / e ■ c d 3 r over the volume of the system, and 
this quantity, if not zero, has topological meaning [32] . 

Two comments make these speculations even more speculative: to as- 
sociate the active site with some defects of an order is a classical (not 
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quantum) view. Moreover, the chain constraint has been forgotten: the 
equilibrium state of a chain depends also on the chain conformational 
degrees of freedom: taking a flexible continuous string as an example, 
mechanical equilibrium implies for instance that T — V = Cst along the 
string, where T is the tension of the string and V the potential energy 
per unit length. Any variation (or defect) in the electrostatic part V 
will also show up in the conformational degrees of freedom (represented 
here by T). It is interesting to note that a recent paper computes knots- 
related invariants with respect to the fj's degrees of freedom [33J |331 |3S] . 
These invariants may have an electrostatic interpretation [36J. 

• Possible connections with more geometric approaches 

Various geometrical aspects of proteins have been recently stressed, 
mostly in relation with packing properties [371 135]. An older connection 
concerns surfaces j3J3 EDI EH 132] which were argued to be important in 
protein folding. 

From an electrostatic point of view, one could naively expect positive 
(resp. negative) charges of a protein to be in a negative (resp. positive) 
electrostatic potential. Since the main backbone chain has both types 
of charges, it should be close to an equipotential (not necessarily min- 
imal) surface. The same type of description applies to hydrophilicities 
variables (A*), and suggests a-curve-on-an-interface (or surface) view of 
a protein. 

Most prominent among these surfaces are minimal surfaces (i.e. sur- 
faces with zero mean curvature) 03], which have been introduced in 
various physical problems, including blue phases [H]. The surface view 
of proteins is interesting: the ideal secondary structures described by 
equation (fTB]) can also be described as the asymptotic curves of a heli- 
coid. There is ample room in the geometry of surfaces for the existence 
of an active "site" (focal surfaces, flat points,...), but more importantly, 
there are remarkable surface to surface transformations ]13j, with very 
low energy barriers. Clearly, it would be interesting to implement these 
transformations on a computer. 
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6 Simplified models: heteropolymeric approaches 

There are many recent reviews of the freezing transitions of various het- 
eropolymeric models see e.g. ^SJ EHl EZ1 EH1 ESI- I will therefore make a 
sketchy presentation of this approach. These models, which are something 
like spin glasses j^O] with a chain constraint, are difficult to solve, except in 
high enough dimensions. The free energy of the frozen phase is determined, 
as in spin glasses, by subtle non-extensive terms. One views a protein as 
a random polymer with a fixed disordered sequence, corresponding to the 
primary sequence. Analytical methods (I will not consider here numerical 
simulations) broadly fall into two classes 

(i) replica calculations, where one averages the disorder over some distri- 
bution. 

(ii) calculations where the disorder is not averaged (TAP-like self consis- 
tent field equations, Imry-Ma arguments, variational calculations, dynamical 
equations,...). 

These models take a rather coarse grained view of the protein, so that a 
residue Ri is represented by a monomer % at position r*j, with some random 
characteristics (polar character or hydrophilicity Aj, charge (/«,....). The dis- 
order is mostly included in two body interactions, with various distributions. 
Denoting by r^- the distance between monomers i and j, and by /(r^) a 
short range interaction, some models of interest are 

• The randomly charged (RC) chain 



(18) 



• The HP chain 




(19) 



For the particular value a = 0, the HP chain is often referred to as the 
the random hydrophilic hydrophobic (RHH) chain. 



• The Hopfield (HO) chain 
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Each monomer i has M generalized charges (gf ; p = 1,2,.., M), so that 

M 

^o = EE«/pN (20) 

p=l ij 

Its coding properties can be studied in a way analogous to spin glasses. 

• the Random Energy Model (REM) chain 

It is a kind of M — > oo limit of the Hopfield chain. 

Hrem = ^Vij f(rij) (21) 

i<j 

with uncorrelated values of the pair interactions and v^i- As in 
spin glasses |5T] , this model is a kind of fixed point for the physics of 
heteropolymers |52j . 



Note that chirality is seldom included in these models, since it is not 
usually expressed as a two body interaction. The phase diagrams of these 
models are approximately known in the thermodynamic limit, in high enough 
dimensions d, and for independent disorder variables (e.g. Aj). 

As in the homopolymeric approach, there are (at least) two coil phases, 
one above a 9 transition and the other above a freezing transition (with slow 
dynamics). In the case of the (RHH) chain, a Flory-Imry-Ma approach yields 
a disorder dependent free energy 

Frhh = jjr + (v + (3X + + ^ (22) 



where (3 = if, Ao (resp. A) is the mean (resp. variance) of the distribution of 
hydrophilicities (Aj), u is a symmetric (Gaussian) random number of variance 
one, and vo,w are two and three body interactions. In the coil phase, i.e. 
at high enough temperature, one may characterize the short range order by 
the ti-dependent term. For u > (hydrophilic fluctuation), small iV behavior 
(that is for < N (u) ~ u2 ( Vo +p\ ) 2 ) can be extracted from a Flory estimate 

2 3 

~ uPh^t, yielding a branched polymer short range order [52]. On the 
other hand, a region with u < (hydrophobic fluctuation) is locally more 

3 

collapsed (with R ~ Nza). 
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Since proteins are not random, it is important to connect real primary 
sequences to random ones. A conservative statement is that proteins have the 
choice between a quick collapse transition towards the structureless 9 globule 
or a slow freezing transition towards a more structured (frozen) globule. A 
possibility is that real sequences fold along a kind of Nishimori line 

[5T)] in the phase diagram (that is a line separating coils with different short 
range order). Another possibility is to introduce long range correlations 
(along the chain) in the distribution of the disorder [57j . 

Finally, as far as I can see, the existence of an active site is not a major 
issue in heteropolymeric models. 

7 Conclusion 

I have discussed some jSH] homo- and hetero- polymeric aspects of proteins. 
The main chain, which refers to the former point, suggests a connection 
between folding and ferroelectric (or ferromagnetic) domain theory, through 
the model of a chiral dipolar chain. On the other hand, the side chains 
point towards a (spin) glass analogy, if their physico-chemical properties 
are represented by disorder variables (charges, hydrophilicities,...). Different 
length scales related either to domain formation (e.g. iV*) and stability (e.g. 
l c ), or to disorder fluctuations (e.g. N (u)), have been shown to arise in this 
"dipolar Imry-Ma" problem p3] . 

A recent paper [60J studies the prediction of bubble and stripe domains 
in uniaxial (Ising) ferromagnetic sytems: the long range dipolar interaction 
is relevant only to define the size of the individual bubbles and one may then 
treat the physics of the problem (bubble bubble interaction, bubble to stripe 
transition,...) through the use of short range interactions only. 

Following the domain theory appeal leads one to consider the disorder 
variables at the length scale of secondary structures, and not for individual 
residues. As we have seen, this length scale is of order 10-20 residues for 
helices and 5-8 residues for individual strands. The case of proteins is cer- 
tainly more complicated than the ferromagnet (chain constraint, solvent,..), 
but there have been important progresses along somewhat similar lines |BTj . 

As for the dynamics of the folding, a very puzzling question remains: 
'how does a protein find its way in phase space"? The answer may require 
a detailed knowledge of the unfolded phase (short range order, topological 
invariants or defects,...). 
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Appendix 
Some properties of n — > spins 

— * 

Consider an n-dimensional classical spin S with 



S 2 = £S*=n (23) 



u=l 



Denning a normalized measure dfi(S) on the (n— 1) dimensional sphere, the 
average of a function A(o) is given by 



< A$) >= J dfi{S) A$) 



An important exemple is the O(n) symmetric function f(k) = f(k) de- 
fined by 

f(k) =< e tl§ > 

One has 

A f( k ) = tU = -<(i*l) e * g >= -nfW (24) 



Since 

one finds that lim n ^ f(k) = 1 — \, implying the following results: 

< S 2 U >= 1 

< S p u >= for p > 2 

SS;>=0 

which imply 

< e n - § >= 1 + 5_ 



Application to Self Avoiding Walks 
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Consider a lattice (r) and define an n- dimensional spins S r at each lattice 
site. Let us consider the following quantity: 

Z N = J dfi{S r )^ E S r A r - r -,S r , (25) 

\<rr'> / 

where = 1 if sites r and r' are nearest neighbour, otherwise (and 
J2<rr'> is the sum over the bonds). 

Using (< S^ >= 1 and < S^ >= for p > 2), we see that each vector S r 
occurs twice in Z N . Since S r ■ S r / = Z)S=i S U rS ur ', the number M. N of closed 
Self Avoiding Walks (SAW) of iV steps on the lattice is given by 

M N = lim n _ ~Z N 
n 



More convenient representation of M.^ 

Defining the grand canonical partition function 

oo 

Z(K) = Z N K N 

N=0 

we get 

Z(K) = Jdfi(S r ) e^M s;A * §r ' (26) 

where the sum (J2( r ,r')) is now over t ne sites. The Hubbard-Stratanovich 
transformation and the properties of n-dimensional spins, as n — > 0, yield 

Z(K) = J V0 r e-^y)^ (A" 1 )^ ? r , + !L$j 

Since Z N is the coefficient of K N in Z(K), we finally have 

M N = lim Q ^K- N Z{K) 
• Fully compact SAW 
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Fully compact SAW's (i.e. Hamiltonian Paths) are obtained by taking 
the monomer fugacity K — >• oo. We therefore get 



N N = lim n -* ~ / V0 r e'^y^ {A ~ 1} -' I] 



(27) 



A homogeneous saddle point on <p r gives 

Mir 

where g = 2c? is the lattice coordination number. Alternatively, we 
could have avoided the grand canonical approach and establish the 
equality of the two members of equation (|27jl through the use of Wick's 
identity 

ip u (r)-<p* (f ') = 6 uv App , 

with (u, v = 1, 2, ...n). 
Fully compact "helices": 

Introducing a curvature energy to disfavor corners in the Hamiltonian 
Paths, the partition function reads 



■A4= E 



e 



- /3N corner a(H P)£h 



(HP, corners) 



Let us now introduce, for each site (r) of the lattice, d n-dimensional 
fields: (f ar (a = l,2,...d). Generalizing equation ([27|) . we now obtain 
equation JSJ): 



= lim n ^ - / 2M»r e"^-!^')^' {Aal)?I " ^ r 
n J 



1 d 



r a =l a<7 

The identity between the two expressions of Afh rely on Wick's theorem 



30 



Performing a homogeneous and isotropic saddle point in equation (J5J), 
we get equation (0) 



Af, 



i, 



with an effective coordination number q(/3) = 2 + 2(d — l)e~^ Sh . 
Fully compact "sheets": 

Denoting by N bonds (HP) the number of H-bonds in a given (HP), the 
partition function reads 

J\f s= ^ e"^ N bonds(HP) 

(HP,bonds) 

where the summation runs over all possible (HP)'s and over all possible 
sets of H-bonds compatible with this path. 

The formalism is more complicated since the integral representation of 
J\f s requires, for each direction a: 

(i) a n-component field tp a (f) to generate the {HP), with n — > 0. 

(ii) two scalar fields (r) and if) a (f) which respectively initiate and 
terminate and H-bond at site r in direction a. 

We also have 

TV = lim Q I/^a#a# + a ^ Ag U r D{r) 



'n / dip a dtp a difj + a e A ° 

where the (normalizing) denominator is due to the introduction of the 
two scalar fields and where 



with 



and 



£>(r)=£~</^ G a (r) + £ & (0 • 0s (0 

a a<5 



31 



The operator A~ , has the same meaning as above, and A~t, is 1 iff 
f ' = r+e a , e a being the unit vector in direction a. The two expressions 
of J\f s can be identified through Wick's theorem: 



^ (f) • (f ') = 6 u J ay A? r , 



Performing a homogeneous and isotropic saddle point on the fields 
(0a (t 5 ) j V'a (r) (r)) l ea ds to a crystallization transition similar to 
the case of helices. 



The Ising chain 



Z= E E expf^E^rA ] (28) 

SAWS,=±1 \ Z i^j / 

where J is the exchange energy. The sums run over all possible SAW and all 
spin configurations. By using a Gaussian transform, it is possible to rewrite 
as 



f I 1 N 

Z = 2 N ]]_d(p r exp -— - E <Pr\l>(fir' + log E II cosh((^ r 

r \ A P J { rjr /} SAW{n} i=l 

(29) 

Mean-field theory can be obtained by performing a saddle-point approxima- 
tion on equation (J29|) . We assume that the chain is confined in a volume V 
with a monomer density p = y. Assuming a translationally invariant field 
if, the mean field free energy per monomer is 

/ = ^ = -T log 2 + - T log Z SAW - T log cosh(<p) (30) 

iv 2pJg 
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where q = 2d and -Zsaw is the total number of SAW of N monomers confined 
in a volume V. It is easily seen that 

Z SAW ~ exp(-V(l - p)log(l - p)) (31) 

so that 

/ = —T log 2 + -^-ip 2 - T log q - + T log(l — p)—T log cosh(<p) (32) 

ipjg e p 

This free energy is to be minimized with respect to (p and p, yielding a 
discontinuous transition to a compact ordered phase |23j. 
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